{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "81b37b4e",
   "metadata": {},
   "source": [
    "## 逻辑回归 Logistic Regression\n",
    "\n",
    "逻辑回归(Logistic Regression, LR)模型其实仅在线性回归的基础上，套用了一个逻辑函数，但也就由于这个逻辑函数，使得逻辑回归模型成为了机器学习领域极为重要的概念和工具。\n",
    "\n",
    "### 1. 逻辑回归模型\n",
    "\n",
    "### 1.1 线性回归模型分析\n",
    "\n",
    "要去理解逻辑回归，首先我们就需要搞清楚回归的本质，实际上它就是一个 $y=f(x)$ 的模型，我们通过这个模型，在拿到自变量$x$的值的前提下，去预测因变量$y$的数据分类"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c7a22e14",
   "metadata": {},
   "source": [
    "![fig1.gif](images/fig1.gif)\n",
    "\n",
    "首先我们解释**线性回归**，即 $y=f(x)$ 是一条直线，图上的 $X$ 表示数据点（tumor size），而 $Y$ 则是对肿瘤的分类，在（1.a）中我们可以看到理想的情况，数据比较集中，在肿瘤较小时线性回归的输出较小，肿瘤变大靠近另一个数据集中处时线性回归的输出随之增大\n",
    "\n",
    "但是当有异常情况的时候，线性回归模型就会表现出比**较差的鲁棒性**（robustness），可以看到这个时候我们的拟合曲线会将圈出的两个较小的$X$值判定为“malignant”"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1aa303de",
   "metadata": {},
   "source": [
    "### 1.2 逻辑回归模型基础\n",
    "\n",
    "* [**逻辑回归**](http://www.uml.org.cn/ai/202009211.asp)就是一种改进的措施，上面存在问题的原因就是两侧噪声点的存在，所以我们可以采取一种新的逻辑函数。\n",
    "\n",
    "* 通俗地说一下我的理解，就是要让函数在$x>>0$和$x<<0$的变化尽量平缓，这样即便在这个范围内有噪点，我们的模型也不会为了去靠近这个噪点而丧失对核心数据的分辨能力。另一方面，这个函数应该对两种数据仍有敏锐的分辨能力，也就是变化率高（放在这个问题里就是在中间大小处函数的变化很快）\n",
    "\n",
    "所以我们选定了**sigmoid function：**\n",
    "$$\n",
    "g(z) = \\frac{1}{1+e^{-z}}\n",
    "$$\n",
    "\n",
    "函数的形状如下"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "fd43818e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX8AAAEICAYAAAC3Y/QeAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAAsTAAALEwEAmpwYAAAiFklEQVR4nO3deZgU1b3/8feXXQUB2QUUVMAtJooRxQ3XIGpwQcUFl6g8JJdE/em9bvdGo1lvYhKjRmIQNxJhiEiIIi73MtEoIsqPRTZZBEFAXABBFBzme/84hTRtz0wzUz3Vy+f1PPX0Ume6PlMz8+2a06dOmbsjIiKlpUHSAUREpP6p+IuIlCAVfxGREqTiLyJSglT8RURKkIq/iEgJUvGXemNml5rZC/m2XTMrN7NrqlhnZvaIma0zszdylzLjtp8zsyvqc5tSOkzj/CVOZnYc8N/AIcA2YD5wvbtPTzRYNcysHBjt7iMzrDseeBLo5e6f5TDDncAB7n5ZrrYhkqpR0gGkeJjZnsAzwPeBMqAJcDywJclcdbQvsCyXhV8kCer2kTj1BHD3J919m7t/7u4vuPtsADO70sz+tb2xmZ1uZgvNbIOZ/dHM/rm9+yVq+6qZ/c7M1pvZUjPrGz2/wszWpnaJmFlLM3vczD40s+Vm9p9m1qCK7Z5mZgui7d4PWKZvxsyuBkYCx5jZJjP7SfprRe3czA6I7j9qZg+Y2bNmttHMppnZ/iltDzGzF83sEzP7wMxuM7P+wG3ARdF2ZkVtv+qOMrMG0fe0PPreHzezltG6blGGK8zsPTP7yMxur/VPUUqCir/E6R1gm5k9ZmZnmFnrqhqaWVvgb8CtQBtgIdA3rVkfYHa0/q/AGODbwAHAZcD9ZtY8ansf0BLYDzgRuBy4qortPgX8J9AWWAIcmymjuz8MDAOmuntzd7+jph0QuRj4CdAaWAz8LNp2C+AlYDKwd/R9/I+7TwZ+DoyNtvPNDK95ZbScFH2PzYH709ocB/QCTgF+bGYHZZlXSpCKv8TG3T8lFCAH/gx8aGYTzaxDhuYDgLnuPt7dK4A/AGvS2rzr7o+4+zZgLNAVuMvdt7j7C8BW4AAzawhcBNzq7hvdfRlwDzCkiu3Oc/e/ufuXwO8zbLeuxrv7G9H39RfgW9HzZwFr3P0ed/8iyjoty9e8FPituy91902EN83BZpbadfuT6L+tWcAsINObiAig4i8xc/f57n6lu3cBDiUc4f4+Q9O9gRUpX+fAyrQ2H6Tc/zxql/5cc8IRfBNgecq65UDnLLe7IkO7ukh9M9kcZYTw5rWklq+5N1///hoBqW+sVW1X5GtU/CVn3H0B8CjhTSDdaqDL9gdmZqmPd9FHwJeED2e32wd4v4rtdk3bbtcM7aryGbB7ytd33IWvXQHsX8W6mobdreLr318FO79BimRNxV9iY2YHmtmNZtYletyV0P/9eobmzwLfMLNzoq6LfwN2pZB+JeoWKgN+ZmYtzGxf4P8Bo6vY7iFmdl603R/t4nZnRV//LTNrBty5C1/7DNDRzK43s6ZR1j7Rug+Abts/pM7gSeAGM+sefc6x/TOCil3YvshXVPwlThsJH9JOM7PPCEX/beDG9Ibu/hFwAeGcgI+Bg4E3qf2w0B8SjsqXAv8ifEA8qprt/jLabg/g1Ww34u7vAHcRPrhdFG0r26/dCJwGnE3oollE+AAXYFx0+7GZzcjw5aOAJ4CXgXeBLwjfs0it6CQvyQvREe9K4FJ3n5J0HpFipyN/SYyZfcfMWplZU8I4dyNzF5GIxKzG4m9mo6KTSt6uYr2Z2R/MbLGZzTazI+KPKUXqGMLol48IXSHnuPvnyUYSKQ01dvuY2QnAJuBxd//aqA0zG0DoexxA6O+91937pLcTEZH8UeORv7u/DHxSTZOBhDcGd/fXgVZm1imugCIiEr84JnbrzM4nyayMnlud3tDMhgJDAZo1a9Z7n332iWHzuVVZWUmDBvn/0YhyxqsQchZCRgg5zRqwbZuxbVsDtm2DykqLlh33t20Lj9133Lpnehy+phTtv/8mlixZ+JG7t6vra8VR/DP9FDL2Jbn7Q8BDAL169fKFCxfGsPncKi8vp1+/fknHqJFyxqsQcuZDRnf4+GNYvhyWLQu3y5fDqlWwdm1YVq/+ko0bG1f7Oo0aQcuWYdl9d9htt7A0a7bjNv1+kybQuHH42l1dGjQIi9mO25kz/z+9ex+OGV8t29fV5v72ZTtLq5S1WdelCzRqZMuJQRzFfyU7nyHZhXA2oogUkfffh9mzYd68nZdPP925XfPm0LkzdOgAhx4KBx64lsMP70z79tC+PbRuDXvuGZaWLcNt06ZfL4D1bwPHH590hvoTR/GfCAw3szGED3w3uPvXunxEpHB8/jnMmAGvvw5Tp4bb91Mmy+jQAQ4+GIYMgQMOgG7dYN99w9K69c6FvLx8Ef36ZZpmSZJUY/E3syeBfkBbM1sJ3AE0BnD3EcAkwkifxYTJpL42ja6I5Dd3WLAAJk8Oy8svwxdfhHXdu8MJJ8DRR8Phh4ei36ZNsnml7mos/u5+cQ3rnTAvi4gUmHnzoKwsLPPnh+cOOgiGDYN+/ULB75BpQm4peLqMo0iJ+ewzePJJGDEC3nordNGceCIMHw5nnQUFMAhPYqDiL1IiVqyA3/wGHn00fEh76KFw771w4YXQsVbzqUohU/EXKXLvvgu/+EUo+u5w0UXw/e9D3775MMJGkqLiL1KkNmyAO++E++8P48+vuQZuvjmMyBFR8RcpMpWV4Sj/1lvhww9D0b/jjjD2XmQ7FX+RIrJsGVxxRRiq2bcvPPccHKF5diUDFX+RIuAOo0eHETvuMGoUXHml+vSlair+IgXus89C186YMXDccfDEE+GMW5Hq5P+UgCJSpZUrw9m3ZWXw059CebkKv2RHR/4iBWrBghZccgls2gQTJ8KZZyadSAqJjvxFCtCkSXDddd+iaVN47TUVftl1OvIXKTDPPAPnnw/77ruZf/2rBe3bJ51ICpGO/EUKyMSJcN55cNhhcM89s1T4pdZU/EUKxD/+AYMGhWmVX3wRWrSoSDqSFDB1+4gUgOnTw5w83/oWvPBCuAKWSF3oyF8kzy1fDmefHWbefOYZFX6Jh478RfLYhg1hJM8XX8CUKaiPX2Kj4i+Sp7ZtC109CxeGSysedFDSiaSYqPiL5Klf/AKefx7+9Cc45ZSk00ixUZ+/SB565ZUwDfMll8C11yadRoqRir9InvnkE7j0UujeHR58UDNzSm6o20ckj7jD1VfDmjVh2oY990w6kRQrFX+RPDJqFEyYAPfcA0cemXQaKWbq9hHJE2vWwE03hSmar78+6TRS7FT8RfLEDTfA5s1hdE8D/WVKjulXTCQPTJoUrsR1++1w4IFJp5FSoOIvkrBNm+AHPwgncd18c9JppFToA1+RhP3kJ2H+nldegaZNk04jpUJH/iIJWrIE7r0Xvve9cPF1kfqi4i+SoNtvh8aN4e67k04ipUbFXyQhb7wBY8fCjTfC3nsnnUZKjYq/SALcw5j+9u3h3/896TRSivSBr0gCJk4MH/A++CC0aJF0GilFOvIXqWcVFWFI54EHwjXXJJ1GSpWO/EXq2dix4QIt48dDI/0FSkKyOvI3s/5mttDMFpvZLRnWtzSzf5jZLDOba2ZXxR9VpPBVVsLPfgbf+AYMHJh0GillNR53mFlD4AHgNGAlMN3MJrr7vJRm/wbMc/ezzawdsNDM/uLuW3OSWqRAPf00zJ8fpnLQ/D2SpGx+/Y4CFrv70qiYjwHSj1kcaGFmBjQHPgEqYk0qUuDc4ac/hZ49YdCgpNNIqTN3r76B2SCgv7tfEz0eAvRx9+EpbVoAE4EDgRbARe7+bIbXGgoMBWjXrl3vsrKyuL6PnNm0aRPNmzdPOkaNlDNeucg5dWobbrvtG9x883z69/+gzq9XyvsyFwol50knnfSWu9f9ag/uXu0CXACMTHk8BLgvrc0g4HeAAQcA7wJ7Vve6PXv29EIwZcqUpCNkRTnjFXfOykr3Pn3cu3Vz37o1ntcs1X2ZK4WSE3jTa6jb2SzZdPusBLqmPO4CrEprcxUwPsq2OCr+mphWJPK//wvTpsEtt4TpHESSlk3xnw70MLPuZtYEGEzo4kn1HnAKgJl1AHoBS+MMKlLIfvtb6NABrrwy6SQiQY2jfdy9wsyGA88DDYFR7j7XzIZF60cAdwOPmtkcQtfPze7+UQ5zixSMd94JF2u54w5N2Sz5I6tTTNx9EjAp7bkRKfdXAafHG02kONx/f+jqGTYs6SQiO2iksUgObdgAjzwCgwdDx45JpxHZQcVfJIceeSRcpvG665JOIrIzFX+RHNm2De67D449Fnr3TjqNyM5U/EVy5NlnYelSHfVLflLxF8mR+++HLl3gnHOSTiLydSr+Ijnw7rvw4oswdKhO6pL8pOIvkgOjRoVZO3VSl+QrFX+RmFVUhFE+/ftD1641txdJgoq/SMwmT4b339clGiW/qfiLxGzkyDCPz1lnJZ1EpGoq/iIxWr0annkm9PXrg17JZyr+IjF67LFwctfVVyedRKR6Kv4iMamsDF0+J54IPXoknUakeir+IjF57TVYskRH/VIYVPxFYvLEE7DHHnDeeUknEamZir9IDLZsgbIyOPfc8AYgku9U/EViMGkSrF8Pl12WdBKR7Kj4i8Rg9Ogwtv+UU5JOIpIdFX+ROlq3Loztv/hiaJTVhVFFkqfiL1JH48bB1q3q8pHCouIvUkejR8OBB8IRRySdRCR7Kv4idbBsGbzyCgwZAmZJpxHJnoq/SB08+WS4veSSZHOI7CoVf5E6KCuDo4+Gbt2STiKya1T8RWrpnXdg5ky46KKkk4jsOhV/kVoqKwu3gwYlm0OkNlT8RWqprAyOPRa6dEk6iciuU/EXqYX582HOHLjwwqSTiNSOir9ILYwbF4Z2qstHCpWKv0gtlJXB8cfD3nsnnUSkdlT8RXbR3LlhUZePFDIVf5FdNG4cNGgA55+fdBKR2lPxF9lF48bBCSdAx45JJxGpPRV/kV3wzjswb54u1SiFL6vib2b9zWyhmS02s1uqaNPPzGaa2Vwz+2e8MUXyw9//Hm4HDkw2h0hd1XjpCTNrCDwAnAasBKab2UR3n5fSphXwR6C/u79nZu1zlFckURMmhKmb99kn6SQidZPNkf9RwGJ3X+ruW4ExQPpxzyXAeHd/D8Dd18YbUyR5a9bA1KlwzjlJJxGpu2wuOtcZWJHyeCXQJ61NT6CxmZUDLYB73f3x9Bcys6HAUIB27dpRXl5ei8j1a9OmTcoZo0LO+cwznXDvRefO0ykv/yyZYCkKeV/mo0LJGRt3r3YBLgBGpjweAtyX1uZ+4HVgD6AtsAjoWd3r9uzZ0wvBlClTko6QFeWMV6acAwa477efe2Vl/efJpJD3ZT4qlJzAm15D3c5myabbZyXQNeVxF2BVhjaT3f0zd/8IeBn4Zm3fkETyzcaN8NJLoctHV+ySYpBN8Z8O9DCz7mbWBBgMTExr83fgeDNrZGa7E7qF5scbVSQ5zz0XLtKu/n4pFjX2+bt7hZkNB54HGgKj3H2umQ2L1o9w9/lmNhmYDVQSuonezmVwkfo0YQK0bQt9+yadRCQe2Xzgi7tPAialPTci7fGvgV/HF00kP2zdCs8+G2bwbNgw6TQi8dAZviI1KC+HTz9Vl48UFxV/kRpMmAB77AGnnpp0EpH4qPiLVKOyMkzp0L8/7LZb0mlE4qPiL1KNN9+EVavU5SPFR8VfpBoTJoQPec88M+kkIvFS8RepxoQJ0K8ftG6ddBKReKn4i1Rh4UKYP19dPlKcVPxFqqC5+6WYqfiLVOHpp6F3b+jatea2IoVGxV8kg48/bsLrr6vLR4qXir9IBq++2gZQ8ZfipeIvksGrr7Zl//3hkEOSTiKSGyr+Imk+/RRmzGitufulqKn4i6R57jmoqGigLh8pair+ImkmTIBWrbZyzDFJJxHJHRV/kRRbtoS5+/v2/Vhz90tRU/EXSVFeHq7Xe9xxHyYdRSSnVPxFUmyfu7937/VJRxHJKRV/kcj2ufvPOAOaNKlMOo5ITqn4i0TeeANWr9aJXVIaVPxFIhMmQKNGMGBA0klEck/FXySiufullKj4iwALFoT5+9XlI6VCxV+EcNQP8N3vJhpDpN6o+IsQiv+RR2rufikdKv5S8latgmnT1OUjpUXFX0rexInhVsVfSomKv5S88ePhgAPg4IOTTiJSf1T8paStWwdTpsB552nufiktKv5S0p59Fioq4Nxzk04iUr9U/KWkjR8Pe+8NRx2VdBKR+qXiLyVr82aYPDl80NtAfwlSYvQrLyXr+efh889Df79IqVHxl5L19NNhHp8TTkg6iUj9U/GXkvTll/CPf4TpHBo3TjqNSP3LqvibWX8zW2hmi83slmrafdvMtpnZoPgiisSvvBzWr9coHyldNRZ/M2sIPACcARwMXGxmXzsdJmr3K+D5uEOKxO3pp2H33eH005NOIpKMbI78jwIWu/tSd98KjAEGZmj3Q+ApYG2M+URiV1kZJnI74wzYbbek04gko1EWbToDK1IerwT6pDYws87AucDJwLereiEzGwoMBWjXrh3l5eW7GLf+bdq0STljlA85587dk9Wrj6BXr3mUl2c+VsmHnDUphIygnHnL3atdgAuAkSmPhwD3pbUZBxwd3X8UGFTT6/bs2dMLwZQpU5KOkBXlzN5NN7k3buy+bl3VbfIhZ00KIaO7csYNeNNrqK/ZLNkc+a8EUmc57wKsSmtzJDDGwuQobYEBZlbh7hNq95Ykkhvuob//5JOhVauk04gkJ5s+/+lADzPrbmZNgMHAxNQG7t7d3bu5ezfgb8APVPglH82ZA0uW6MQukRqP/N29wsyGE0bxNARGuftcMxsWrR+R44wisRk/PszeOTDTkAWREpJNtw/uPgmYlPZcxqLv7lfWPZZIbowbF87o7dAh6SQiydIZvlIy5s6FefPgwguTTiKSPBV/KRljx4bZO9XfL6LiLyXCHcrK4MQToWPHpNOIJE/FX0rCnDmwcKG6fES2U/GXklBWpi4fkVQq/lL0tnf5nHwytG+fdBqR/KDiL0Vv5kxYtEhdPiKpVPyl6JWVQcOGmrtfJJWKvxQ19zDE8+SToW3bpNOI5A8VfylqU6fCu+/CpZcmnUQkv6j4S1EbPTpcsEVdPiI7U/GXorV1a+jyGTgQ9twz6TQi+UXFX4rW5MnwyScwZEjSSUTyj4q/FK3Ro6FdOzjttKSTiOQfFX8pShs2wMSJMHgwNG6cdBqR/KPiL0Xpqadgyxa47LKkk4jkJxV/KUpPPAE9e8K3v510EpH8pOIvRWfFCigvD0f9ZkmnEclPKv5SdB59NNyqy0ekair+UlQqK+Hhh8MIn+7dk04jkr9U/KWovPQSLF8O11yTdBKR/KbiL0Vl5Eho0yac1SsiVVPxl6Lx4YcwYQJcfjk0bZp0GpH8puIvRePxx+HLL9XlI5INFX8pCu6hy6dvXzj44KTTiOS/RkkHEInDa6/BggUwalTSSUQKg478pSiMGAEtWsAFFySdRKQwqPhLwVu9Oszbf9VV0Lx50mlECoOKvxS8ESOgogJ++MOkk4gUDhV/KWhbtoTiP2AAHHBA0mlECoeKvxS0sWNh7Vq47rqkk4gUFhV/KVjucO+9YWjnqacmnUaksGiopxSsV1+FGTNCt4+mbhbZNTryl4J1773QurWmbhapjayKv5n1N7OFZrbYzG7JsP5SM5sdLa+Z2Tfjjyqyw6JFMH48XHst7LFH0mlECk+Nxd/MGgIPAGcABwMXm1n6CfTvAie6+2HA3cBDcQcVSfXLX0KTJnDDDUknESlM2Rz5HwUsdvel7r4VGAPsNGGuu7/m7uuih68DXeKNKbLD8uVhErdrr4WOHZNOI1KYzN2rb2A2COjv7tdEj4cAfdx9eBXtbwIO3N4+bd1QYChAu3btepeVldUxfu5t2rSJ5gVw2mgp5fz973vw7LOd+Otfp9Gu3ZaYku2sEPZnIWQE5YzbSSed9Ja7H1nnF3L3ahfgAmBkyuMhwH1VtD0JmA+0qel1e/bs6YVgypQpSUfISqnkfP9996ZN3YcOjSdPVQphfxZCRnfljBvwptdQX7NZshnquRLomvK4C7AqvZGZHQaMBM5w94/r8H4kUqXf/CZM5XDzzUknESls2fT5Twd6mFl3M2sCDAYmpjYws32A8cAQd38n/pgi4UpdI0bApZfCfvslnUaksNV45O/uFWY2HHgeaAiMcve5ZjYsWj8C+DHQBvijhbNtKjyOPimRFHffDVu3wm23JZ1EpPBldYavu08CJqU9NyLl/jWALp4nObNoETz4YLhEY69eSacRKXw6w1cKwq23houy33ln0klEioOKv+S9116Dp56C//gPjesXiYuKv+Q1d7jpJujUCW68Mek0IsVDs3pKXhs/HqZOhT//WXP4iMRJR/6StzZuDHP3HHooXHll0mlEiouO/CVv/dd/wcqV4WpdjfSbKhIrHflLXnrjDfjDH+D734djjkk6jUjxUfGXvPPllzB0aPiQ9+c/TzqNSHHSP9OSd373O5g1KwzvbNky6TQixUlH/pJX3n4b7rgDBg6Ec89NOo1I8VLxl7yxeTNcdFE42v/Tn3RRdpFcUreP5I0bboB58+D556FDh6TTiBQ3HflLXhg3Dh56KMzTf/rpSacRKX4q/pK4pUvD9Xj79AnTNotI7qn4S6LWrYMzz4SGDeHJJ6Fx46QTiZQG9flLYrZuhfPPhyVL4KWXoHv3pBOJlA4Vf0mEOwwbBlOmwBNPwAknJJ1IpLSo20cScddd8MgjYUz/ZZclnUak9Kj4S727665wRa4rrgjFX0Tqn7p9pF49+mg3HnssFP6HH9aJXCJJ0ZG/1At3+PGP4bHHunHVVaHwN2yYdCqR0qUjf8m5LVvCLJ2PPw4DBqxm5MhONNBhh0iiVPwlp9auhfPOg1dfDX39xx23kAYNOiUdS6Tk6fhLcuatt8JZuzNmQFlZuDKX+vhF8oOKv8Ru27ZwEZajj4aKCnj5ZbjggqRTiUgqdftIrJYuhcsvD908F14II0ZA69ZJpxKRdDryl1hs3hzG7h9yCMyZE87aHTNGhV8kX+nIX+qksjJcbvGmm+C992DwYPj1r6FLl6STiUh1dOQvtVJZGebgP/zw0L3TqhX8859hZk4VfpH8p+Ivu2TTpnDRlUMPDUV/y5bQxfPWW5qcTaSQqNtHauQehms+/DCMHg0bN8I3vxn69AcN0pm6IoVIxV8ycoeZM0PXTllZmHO/WbNwgfVhw8L4fY3ZFylcKv7ylVWrQr/95MnwwguwZk04qj/lFLj1Vjj3XNhrr6RTikgcVPxL1Lp1MHt2WKZNC+Pyly0L6/baC047Db7zHTj7bGjbNtGoIpIDKv5FrKIiDL9csiScfLVkCcybFwr+ihU72nXsCMceCz/6Ubjt3Vv9+CLFLqvib2b9gXuBhsBId/9l2nqL1g8ANgNXuvuMmLOWPPdwMtWGDbB+fVg+/hhWr4apU/dl7NjQVbN6dVjefz9MtbBdkybQs2cYlXPYYeFD28MOC8Vf/fcipaXG4m9mDYEHgNOAlcB0M5vo7vNSmp0B9IiWPsCD0W1s3Hfcbl9qehzH13zySRPWrNnxuKIiLNu27bifutT0/Nat8PnnO5Yvvtj5cfpzGzfuKPTr14fXyKw7bdpAp06hmPfsCV27wv77w377hdvOndFUyiICZHfkfxSw2N2XApjZGGAgkFr8BwKPu7sDr5tZKzPr5O6rq3rRRYta0KxZdkU5WX1zvoWmTWG33XYszZrtuG3XDnr0CCdRtWoFLVvuuN+qVZg+oVMnWLDgn5x22ok5zyoixSGb4t8ZSOkhZiVfP6rP1KYzsFPxN7OhwNDo4ZYtW+ztXUqbjLbAR7ncwJYtYVm/vk4vk/OcMVHO+BRCRlDOuPWK40WyKf6ZeoPTj8ezaYO7PwQ8BGBmb7r7kVlsP1HKGS/ljE8hZATljJuZvRnH62TTA7wS6JryuAuwqhZtREQkT2RT/KcDPcysu5k1AQYDE9PaTAQut+BoYEN1/f0iIpKsGrt93L3CzIYDzxOGeo5y97lmNixaPwKYRBjmuZgw1POqLLb9UK1T1y/ljJdyxqcQMoJyxi2WnObJD6cREZF6plHfIiIlSMVfRKQE5bT4m9kFZjbXzCrN7Mi0dbea2WIzW2hm36ni6/cysxfNbFF0m/MrwprZWDObGS3LzGxmFe2WmdmcqF0sQ692hZndaWbvp2QdUEW7/tE+XmxmtySQ89dmtsDMZpvZ02bWqop29b4/a9o30QCGP0TrZ5vZEfWRKy1DVzObYmbzo7+l6zK06WdmG1J+F35c3zmjHNX+DPNkf/ZK2U8zzexTM7s+rU0i+9PMRpnZWrMd5z9lWwNr9Xfu7jlbgIMIJySUA0emPH8wMAtoCnQHlgANM3z9fwO3RPdvAX6Vy7wZtn8P8OMq1i0D2tZnnrTt3wncVEObhtG+3Q9oEu3zg+s55+lAo+j+r6r6Gdb3/sxm3xAGMTxHOI/laGBaAj/nTsAR0f0WwDsZcvYDnqnvbLv6M8yH/Znhd2ANsG8+7E/gBOAI4O2U52qsgbX9O8/pkb+7z3f3hRlWDQTGuPsWd3+XMEroqCraPRbdfww4JydBM4gmq7sQeLK+tpkDX03N4e5bge1Tc9Qbd3/B3bfPSPQ64RyQfJDNvvlq2hJ3fx1oZWad6jOku6/2aJJEd98IzCecPV+IEt+faU4Blrj78gQzfMXdXwY+SXs6mxpYq7/zpPr8q5oOIl0Hj84XiG7b10O27Y4HPnD3RVWsd+AFM3srmrYiCcOjf59HVfHvYLb7ub58j3Dkl0l9789s9k1e7T8z6wYcDkzLsPoYM5tlZs+Z2SH1m+wrNf0M82p/Es5ZqurgLh/2J2RXA2u1X+s8n7+ZvQR0zLDqdnf/e1VfluG5ehtzmmXmi6n+qP9Yd19lZu2BF81sQfTOXS85CTOn3k3Yb3cTuqi+l/4SGb429v2czf40s9uBCuAvVbxMzvdnmtimLakPZtYceAq43t0/TVs9g9B1sSn67GcCYYbd+lbTzzCf9mcT4LvArRlW58v+zFat9mudi7+7n1qLL8t2OogPLJodNPr3cG1tMqarKbOZNQLOA3pX8xqrotu1ZvY04V+vWItVtvvWzP4MPJNhVb1Mu5HF/rwCOAs4xaNOygyvkfP9maZgpi0xs8aEwv8Xdx+fvj71zcDdJ5nZH82srbvX6yRlWfwM82J/Rs4AZrj7B+kr8mV/RrKpgbXar0l1+0wEBptZUzPrTnhXfaOKdldE968AqvpPIm6nAgvcfWWmlWa2h5m12H6f8KFmvc5QmtZXem4V289mao6csnAhoJuB77r75iraJLE/C2Lakuizp4eB+e7+2yradIzaYWZHEf6uP66/lFn/DBPfnymq/M8+H/ZnimxqYO3+znP86fW5hHelLcAHwPMp624nfEK9EDgj5fmRRCODgDbA/wCLotu9cpk3JcOjwLC05/YGJkX39yN8oj4LmEvo3qjvkQFPAHOA2dEPulN6zujxAMIIkSUJ5VxM6I+cGS0j8mV/Zto3wLDtP3vCv9MPROvnkDJirR7333GEf+Fnp+zDAWk5h0f7bRbhQ/W+CeTM+DPMt/0Z5didUMxbpjyX+P4kvBmtBr6M6ubVVdXAOP7ONb2DiEgJ0hm+IiIlSMVfRKQEqfiLiJQgFX8RkRKk4i8iUoJU/EVESpCKv4hICfo/69hGwR40LjQAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "#在z=0处变化率高，十分敏感，在两侧的变化率低，有效排除噪声干扰\n",
    "%matplotlib inline\n",
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "\n",
    "plt.figure()\n",
    "plt.axis([-10,10,0,1])\n",
    "plt.grid(True)\n",
    "X=np.arange(-10,10,0.1)\n",
    "y=1/(1+np.e**(-X))\n",
    "plt.plot(X,y,'b-')\n",
    "plt.title(\"Sigmoid function\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "81ff3fc0",
   "metadata": {},
   "source": [
    "我们可以看到这个函数有很多优良的性质\n",
    "\n",
    "$$\n",
    "\\lim_{z\\to0} g(z)=0 \\\\\n",
    "\\lim_{z\\to+\\infty}g(z)=1 \\\\\n",
    "$$\n",
    "$$\n",
    "g'(z)=\\frac{d}{dz} \\frac{1}{1+e^{-z}}=\\frac{1}{1+e^{-z}}(1-\\frac{1}{1+e^{-z}})=  g(z)(1-g(z))\n",
    "$$\n",
    "\n",
    "尤其是最后一个特性，在后面的推导中会使用到"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8cf1612f",
   "metadata": {},
   "source": [
    "\n",
    "* **逻辑回归的实质**\n",
    "\n",
    "在最开头我们已经说明了，**逻辑回归就是在线性回归的基础之上套用了一层逻辑函数，即先把特征线性求和，然后使用函数$g(z)$将做为假设函数来预测。**$g(z)$可以将连续值映射到0到1之间。线性回归模型的表达式带入$g(z)$，就得到逻辑回归的表达式:\n",
    "\n",
    "在定义预测函数的时候，我们通常用$h_\\theta(x)$来表示，对于一维空间的线性预测函数，也是上面二分类问题的线性边界条件\n",
    "\n",
    "$$\n",
    "h_\\theta(x)=\\theta_0+\\theta_1x_1\n",
    "$$\n",
    "\n",
    "进而在多维空间上有\n",
    "\n",
    "$$\n",
    "h_\\theta(x)=\\theta_0+\\theta_1x_1+...+\\theta_nx_n=\\sum_{i=1}^{N}\\theta_ix_i=\\theta^Tx\n",
    "$$\n",
    "\n",
    "所以预测函数可以用向量形式表示为\n",
    "\n",
    "$$\n",
    "h_\\theta(x) = g(\\theta^T x) = \\frac{1}{1+e^{-\\theta^T x}}\n",
    "$$\n",
    "\n",
    "所以我们就构造出了一个先线性求和，再放入$g(x)$的**新预测函数**$h_\\theta(x)$，同时也把预测函数的输出值由$(-\\infty,+\\infty)$变换到了$(0,1)$"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0675ae32",
   "metadata": {},
   "source": [
    "### 1.3 逻辑回归的软分类\n",
    "\n",
    "下面的步骤和线性回归 寻找回归函数-梯度下降-得出迭代公式 的步骤十分类似\n",
    "\n",
    "### 1.3.1 什么是软分类？\n",
    "\n",
    "统计学中有两种模型——概率模型和非概率模型\n",
    "\n",
    "概率模型：形式为$P(x|y)$,学习过程中，y未知，训练后得到的输出是x的一系列值的概率\n",
    "\n",
    "非概率模型：形式为决策函数，即输入x到输出y的一个映射，且输出唯一\n",
    "\n",
    "软分类：使用的是概率模型，输出不同类对应的概率，最后的分类结果取概率最大的类\n",
    "\n",
    "硬分类：使用的是非概率模型，分类结果就是决策函数的决策结果"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8b9de6bf",
   "metadata": {},
   "source": [
    "### 1.3.2 如何由预测函数得到逻辑回归表达式？\n",
    "\n",
    "因为\n",
    "$$\n",
    "P(y=1|x,\\theta) = h_\\theta(x) \\\\\n",
    "P(y=0|x,\\theta) = 1 - h_\\theta(x)\n",
    "$$\n",
    "\n",
    "合并这两个式子可以得到（当y取1的时候是前一项概率，否则是后一项）\n",
    "\n",
    "$$\n",
    "p(y|x,\\theta) = (h_\\theta(x))^y (1 - h_\\theta(x))^{1-y}\n",
    "$$\n",
    "\n",
    "这里的$x y$实际上都是向量的形式，表示一整个训练的数据集$[x_1...x_m]、[y_1...y_m]$，所以拆开之后就可写作\n",
    "$$\n",
    "L(\\theta)=\\prod_{i=1}^{m}P(y_i|x_i,\\theta)=\\prod_{i=1}^{m}(h_\\theta(x_i))^{y_i}(1-h_\\theta(x_i))^{1-y_i}\n",
    "$$\n",
    "\n",
    "这一步类比线性回归，实质上就是得到回归表达式，然后构建似然函数，进行最大似然估计，推出迭代公式的步骤，关于最大似然的相关概念，可以回顾概率论参数估计部分的内容\n",
    "\n",
    "### 1.3.3 如何得到迭代公式？\n",
    "\n",
    "对似然函数取对数，可得（这里的上角标就是上式的下标）\n",
    "![LogLoss](images/eq_logloss.png)\n",
    "\n",
    "然后继续对$\\theta$求偏导，寻找最佳参数（仅以$\\theta_j$为例）\n",
    "![LogLossDiff](images/eq_logloss_diff.png)\n",
    "\n",
    "* 注意$y_i$并非是$\\theta$的函数，所以第一步只需要把$log(g(\\theta^Tx))$的导数写开\n",
    "* 第二步则是利用了前面的sigmoid导数性质\n",
    "* 在求导的过程中注意$\\theta^Tx$实际上就是一个和式\n",
    "\n",
    "**因为我们要求的是最大似然估计，也就是求似然函数$L(\\theta)_max$,所以我们采取梯度上升的方法**\n",
    "\n",
    "$$\n",
    "\\theta_j^1 = \\theta_j^0 +\\alpha\\triangledown L(\\theta)=\\theta_j^0 + \\alpha(y^i - h_\\theta(x^i)) x_j^i\n",
    "$$\n",
    "\n",
    "**这就是我们的核心迭代策略**"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
